Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(sampler): consolidate sampling interface, part 1 #5312

Merged
merged 30 commits into from
Sep 7, 2022

Conversation

mananshah99
Copy link
Contributor

@mananshah99 mananshah99 commented Aug 30, 2022

This PR begins the effort to consolidate PyG's sampling interface in preparation for moving sample(...) behind the GraphStore interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. There are many TODOs left in this PR; all will be resolved in forthcoming PRs (the code will become cleaner in short order).

This first change moves NeighborSampler behind a common base class with clearly defined inputs and outputs. Many more changes will be required before this base class defines a stable interface (e.g. LinkNeighborSampler has different semantics, NeighborSampler.perm_dict is used in NeighborLoader, etc.). But for now, it will act as a first step towards this goal.

It further refactors loader/utils.py into those that are required for sampling (moved into sampler/utils.py) and those that are required for filtering (remain in loader/utils.py).

Note about integration with GraphStore. Moving sampling behind the graph store requires defining a clear interface for sampling across PyG. Since a graph store should not inherently be coupled with a sampling method used (e.g. one can imagine using a NeighborLoader and LinkNeighborLoader on the same graph store), we must define this interface independently from the GraphStore. One can then imagine the GraphStore defining supported samplers for different loaders (along with the defaults, which sample in-memory), and those samplers are then seamlessly used in the loaders.

@mananshah99 mananshah99 changed the base branch from master to remote_backend_1 August 30, 2022 07:00
@codecov
Copy link

codecov bot commented Aug 30, 2022

Codecov Report

Merging #5312 (873f403) into master (5b1051f) will increase coverage by 0.04%.
The diff coverage is 97.67%.

@@            Coverage Diff             @@
##           master    #5312      +/-   ##
==========================================
+ Coverage   83.33%   83.38%   +0.04%     
==========================================
  Files         338      342       +4     
  Lines       18678    18745      +67     
==========================================
+ Hits        15565    15630      +65     
- Misses       3113     3115       +2     
Impacted Files Coverage Δ
torch_geometric/data/graph_store.py 92.85% <ø> (ø)
torch_geometric/loader/link_neighbor_loader.py 93.98% <66.66%> (-0.49%) ⬇️
torch_geometric/sampler/utils.py 94.44% <94.44%> (ø)
torch_geometric/sampler/base.py 96.00% <96.00%> (ø)
torch_geometric/loader/hgt_loader.py 91.66% <100.00%> (+1.10%) ⬆️
torch_geometric/loader/neighbor_loader.py 90.80% <100.00%> (-3.97%) ⬇️
torch_geometric/loader/utils.py 82.75% <100.00%> (-2.20%) ⬇️
torch_geometric/sampler/__init__.py 100.00% <100.00%> (ø)
torch_geometric/sampler/neighbor_sampler.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Base automatically changed from remote_backend_1 to master August 30, 2022 22:26
@abstractmethod
def __init__(
self,
data: Union[Data, HeteroData, Tuple[FeatureStore, GraphStore]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are going to get painful to continue to support.. do we have "data to graphstore"? Can we assume that you will always convert to the new representation? Maybe that doesn't make sense...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a great point. We do plan to consolidate behind Tuple[FeatureStore, GraphStore] in the near future, but probably in a separate PR.

@mananshah99 mananshah99 marked this pull request as ready for review September 6, 2022 23:00
@mananshah99 mananshah99 changed the title draft: consolidate sampling interface in preparation for moving into GraphStore feat(sampler): consolidate sampling interface, part 1 Sep 6, 2022
@mananshah99 mananshah99 requested a review from a team September 6, 2022 23:52
@mananshah99 mananshah99 changed the title feat(sampler): consolidate sampling interface, part 1 refactor(sampler): consolidate sampling interface, part 1 Sep 7, 2022
Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

CHANGELOG.md Outdated Show resolved Hide resolved
torch_geometric/sampler/__init__.py Show resolved Hide resolved
torch_geometric/sampler/__init__.py Show resolved Hide resolved
torch_geometric/sampler/base.py Outdated Show resolved Hide resolved
# * edge: a tensor of the indices in the original graph; e.g. to be used to
# obtain edge attributes.
class SamplerOutput(NamedTuple):
node: Union[torch.Tensor, Dict[NodeType, torch.Tensor]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to add an optional batch attribute which assigns each node to an example. This will be necessary to integrate with the new pyg-lib implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done; although this may become more complicated once we try to simplify 'metadata'. Can revert later if necessary.

torch_geometric/sampler/base.py Outdated Show resolved Hide resolved
torch_geometric/sampler/base.py Outdated Show resolved Hide resolved
torch_geometric/sampler/base.py Outdated Show resolved Hide resolved
torch_geometric/sampler/neighbor_sampler.py Outdated Show resolved Hide resolved
torch_geometric/sampler/utils.py Outdated Show resolved Hide resolved
@mananshah99 mananshah99 merged commit d111e41 into master Sep 7, 2022
@mananshah99 mananshah99 deleted the remote_backend_2 branch September 7, 2022 19:59
mananshah99 added a commit that referenced this pull request Sep 8, 2022
… 2 (#5365)

This PR begins the effort to consolidate PyG's sampling interface in preparation for moving `sample(...)` behind the `GraphStore` interface. This effort is somewhat large in scope and will be broken into multiple PRs for ease of review. It resolves some TODOs from #5312, and introduces others that will be resolved in follow-up PRs. 

The major change removes `LinkNeighborSampler`, consolidating it under a the common `BaseSampler` interface with `NeighborSampler`. In doing so, it modifies the `BaseSampler` interface to support `sample_from_nodes` and `sample_from_edges`, since oftentimes edge-based sampling applies some transformations and then re-uses the same logic as node-based sampling. A sampler need not define both node-based and edge-based sampling, but if it doesn't, an appropriate error will be raised.

Resolved TODOs:
* `LinkNeighborSampler` and `NeighborSampler` interfaces are aligned
* No more special handling of `edge_type_to_str` in `filter_*`
* No more special handling of `perm_dict`, which removes the need for `edge_type_to_str` entirely
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants